SLOPE for Count Responses
STAT 538A Paper Presentation
Refining the Research Focus
Initial Research Direction: Explore proposed penalization by Zilberman & Abramovich (2025).
Problem: The proposed method relies on Lasso and SLOPE as convex surrogates.
Refined Project Focus: This project centers specifically on SLOPE’s (Sorted L-One Penalized Estimation) performance for count data.
Introduction
SLOPE (Sorted L-One Penalized Estimation) is a method for estimating the parameter \(\beta\) in a parametric statistical model.
- Similar to LASSO, SLOPE uses a penalty term based on the \(\ell_{1}\) norm of the estimator \(\hat{\beta}\).
- Unlike LASSO, SLOPE does not use a constant term \(\lambda\) to calculate the penalty which is applied to the model.
Comparison of \(\ell_{1}\) Penalties
- Penalty term in LASSO regression: \(\lambda\sum_{i=1}^{p}\left\vert\hat{\beta}_{i}\right\vert\)
- SLOPE penalty term: \(\sum_{i=1}^{p}\lambda_{i}\left\vert\hat{\beta}_{(i)}\right\vert\)
- In this equation, \(\lambda_{1} \ge \lambda_{2} \ge \dots \ge \lambda_{p} \ge 0\), and the elements of \(\hat{\beta}\) are sorted so that \(\left\vert\hat{\beta}_{(1)}\right\vert \ge \dots \ge \left\vert\hat{\beta}_{(p)}\right\vert\).
Background - Multiple Hypothesis Testing and FWER
- Example - gene testing with \(n\) patients and \(m > n\) predictors
- Original setting - linear regression with known variance
- FWER - family-wise error rate
- Probability of making at least one type I error
- Bonferroni correction
- Set \(\alpha_{\text{BON}} = \frac{\alpha}{m}\)
Background - FDR
- FDR - false discovery rate
- Expected proportion of false rejections (Type I errors)
- Benjamini-Hochberg
- Order the \(p\)-values from smallest to largest
- Find largest \(p\) such that \(p_{(j)} \leq \frac{qj}{m}\)
From Hypothesis Testing to Inference - LASSO
- Model selection can be viewed as multiple hypothesis testing
- If coefficients are 0 they are not significant
- For orthogonal design matrices LASSO is equivalent to Bonferroni
- Orthogonal design matrix: columns are orthogonal
- LASSO: \(\min \frac 1 2 \|y-X\beta\|_2^2 + \lambda \|\beta\|_1\)
From Hypothesis Testing to Inference - SLOPE
- What if we use BH like penalties instead?
- \(\lambda_{\text{BH}}(i) := \phi^{-1}\left(1-\frac{qi}{2m}\right)\)
- \(\min \frac 1 2 \|y-X\beta\|^2 + \sigma \cdot \sum_{i=1}^m \lambda_{\text{BH}}(i)|\beta_i|\)
- Key differences
- Non-homogenous penalty
- Sorting of coefficients
SLOPE on orthogonal design matrices
- Proven to control the FDR for orthogonal design matrices with known Gaussian errors
- Convex optimization problem
- Not exactly equivalent to Benjamini-Hochberg
SLOPE in general
- Coefficients don’t have to be related to BH
- Need to obey \(\lambda_1 \geq \lambda_2 \ldots \lambda_p \geq 0\)
- Suggested use
- Use SLOPE for model selection
- Once a model is selected, find coefficients with OLS
Experiments
- Simulate a dataset of \(n = 5000\) observations and a large number of predictors \(p\) to assess the FWER and FDR of SLOPE for different ratios \(p/n\).
- Compare the power and false discovery rates as a function of the number of selected predictors
- Compare different procedures (BH and Bonferroni) for selecting the values of \(\lambda_{1}, \dots, \lambda_{p}\)
- Compare FDR and power of SLOPE when \(\sigma\) is known or unknown
Experimental Design: Proposal
- Research Gap: Original SLOPE paper (Bogdan et al., 2015) only discusses linear models with Gaussian error terms.
- Goal: Use simulations to compare SLOPE with other penalized methods (e.g. LASSO, adaptive LASSO) for modelling count responses (Poisson model).
- Research Question: How does the performance of SLOPE regarding variable selection accuracy (FDR and Power) compare to LASSO and Adaptive LASSO when applied to high-dimensional Poisson regression?
Compared Methods & Penalties
- Objective: Minimize \(- \frac{1}{n} \log L({\beta}; y, X) + \text{Penalty}({\beta})\)
- Penalties:
- SLOPE Penalty: \(\sum_{i=1}^{p}\lambda_{i}|\hat{\beta}|_{(i)}\), \(|\hat{\beta}|_{(1)} \ge ... \ge |\hat{\beta}|_{(p)}\), \(\lambda_1 \ge ... \ge \lambda_p \ge 0\).
- Lasso (L1): \(\lambda ||\hat{\beta}||_{1}\)
- Adaptive Lasso: \(\lambda \sum_{j=1}^{p} w_j |\hat{\beta}_j|\) (where \(w_j \propto 1/|\hat{\beta}_{init, j}|^{-\gamma}\))
Experimental Design: Data Generation
- Model: Poisson Regression
- \(Y_i \sim \text{Poisson}(\lambda_i)\)
- \(\log(\lambda_i) = \beta_0 + X_i \beta\)
- Dimensions: \(n = 1000\) observations.
- \(p = 500\) (p < n)
- \(p = 1000\) (p = n)
- \(p = 2000\) (p > n)
- Replications: R = 50 runs per setting.
Experimental Design: Predictors (X)
- Generation: \(X_{ij} \sim N(0, 1)\), columns standardized.
- Correlation Structures:
- Independent: \(\rho = 0\)
- Moderate: \(\rho = 0.5\)
- High: \(\rho = 0.8\)
Experimental Design: True \(\beta\)
- Sparsity \(k = ||\beta||_0\):
- \(k = 10\)
- \(k = 20\)
- \(k = 50\)
- \(k = 100\)
- Signal Strength:
- Simulate “Weak” \(\beta\) scenarios.
- Simulate “Strong” \(\beta\) scenarios.
Experimental Design: Parameter Tuning
- Lasso, Adaptive Lasso:
- 10-fold Cross-Validation.
- Select tuning parameter(s) by minimizing Poisson deviance.
- SLOPE:
- Target FDR level \(q = 0.1\).
- Use BH-inspired sequence \(\{\lambda_i\}\).
Experimental Design: Evaluation Metrics
- False Discovery Rate.
- Power.